AITopics | small weight

Pay Attention to Small Weights

Neural Information Processing SystemsJun-22-2026, 22:04:24 GMT

Finetuning large pretrained neural networks is known to be resource-intensive, both in terms of memory and computational cost. To mitigate this, a common approach is to restrict training to a subset of the model parameters. By analyzing the relationship between gradients and weights during finetuning, we observe a notable pattern: large gradients are often associated with small-magnitude weights. This correlation is more pronounced in finetuning settings than in training from scratch. Motivated by this observation, we propose NANOADAM, which dynamically updates only the small-magnitude weights during finetuning and offers several practical advantages: first, the criterion is gradient-free--the parameter subset can be determined without gradient computation; second, it preserves large-magnitude weights, which are likely to encode critical features learned during pretraining, thereby reducing the risk of catastrophic forgetting; thirdly, it permits the use of larger learning rates and consistently leads to better generalization performance in experiments. We demonstrate this for both NLP and vision tasks.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)
Overview (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
(2 more...)

Add feedback

e1fe6165cad3f7f3f57d409f78e4415f-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 19:50:49 GMT

neural network, poly, threshold circuit, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Pay Attention to Small Weights

Zhou, Chao, Jacobs, Tom, Gadhikar, Advait, Burkholz, Rebekka

arXiv.org Artificial IntelligenceOct-23-2025

Finetuning large pretrained neural networks is known to be resource-intensive, both in terms of memory and computational cost. To mitigate this, a common approach is to restrict training to a subset of the model parameters. By analyzing the relationship between gradients and weights during finetuning, we observe a notable pattern: large gradients are often associated with small-magnitude weights. This correlation is more pronounced in finetuning settings than in training from scratch. Motivated by this observation, we propose NANOADAM, which dynamically updates only the small-magnitude weights during finetuning and offers several practical advantages: first, this criterion is gradient-free -- the parameter subset can be determined without gradient computation; second, it preserves large-magnitude weights, which are likely to encode critical features learned during pretraining, thereby reducing the risk of catastrophic forgetting; thirdly, it permits the use of larger learning rates and consistently leads to better generalization performance in experiments. We demonstrate this for both NLP and vision tasks.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.21374

Country: Europe (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
(2 more...)

Add feedback

e1fe6165cad3f7f3f57d409f78e4415f-Paper.pdf

Neural Information Processing SystemsAug-16-2025, 22:59:27 GMT

neural network, poly, threshold circuit, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Reviews: Extracting Relationships by Multi-Domain Matching

Neural Information Processing SystemsOct-7-2024, 07:38:03 GMT

Title: Extracting Relationships by Multi-Domain Matching Summary Assuming that a corpus is compiled from many sources belonging to different to domains, of which only a strict subset of domains is suitable to learn how to do prediction in a target domain, this paper proposes a novel approach (called Multiple Domain Matching Network (MDMN)) that aims at learning which domains share strong statistical relationships, and which source domains are best at supporting to learn the target domain prediction tasks. While many approaches to multiple-domain adaptation aim to match the feature-space distribution of *every* source domain to that of the target space, this paper suggests to not only map the distribution between sources and target, but also *within* source domains. The latter allows for identifying subsets of source domains that share a strong statistical relationship. Strengths Paper provides a theoretical analysis that yields a tighter bound on the weighted multi-source discrepancy. Weaknesses Tighter bound on multi-source discrepancy depends on the assumption that source domains that are less relevant for the target domain have lower weights.

extracting relationship, source domain, target domain, (7 more...)

Neural Information Processing Systems

Genre: Research Report (0.58)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.38)

Add feedback

Neural Computing with Small Weights

Neural Information Processing SystemsApr-6-2023, 19:17:57 GMT

An important issue in neural computation is the dynamic range of weights in the neural networks. Many experimental results on learning indicate that the weights in the networks can grow prohibitively large with the size of the inputs. Here we address this issue by studying the tradeoffs between the depth and the size of weights in polynomial-size networks of linear threshold elements (LTEs). We show that there is an efficient way of simulating a network of LTEs with large weights by a network of LTEs with small weights. In particular, we prove that every depth-d, polynomial-size network of LTEs with exponentially large integer weights can be simulated by a depth-(2d 1), polynomial-size network of LTEs with polynomially bounded integer weights.

neural computing, polynomial-size network, small weight, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.81)

Add feedback

A Tighter Bound for Graphical Models

Leisink, Martijn A. R., Kappen, Hilbert J.

Neural Information Processing SystemsDec-31-2001

The neurons in these networks are the random variables, whereas the connections between them model the causal dependencies. Usually, some of the nodes have a direct relation with the random variables in the problem and are called'visibles'. The other nodes, known as'hiddens', are used to model more complex probability distributions. Learning in graphical models can be done as long as the likelihood that the visibles correspond to a pattern in the data set, can be computed. In general the time it takes, scales exponentially with the number of hidden neurons.

approximation, boltzmann machine, partition function, (16 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > Gelderland > Nijmegen (0.05)
North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.36)

Add feedback

A Tighter Bound for Graphical Models

Leisink, Martijn A. R., Kappen, Hilbert J.

Neural Information Processing SystemsDec-31-2001

The neurons in these networks are the random variables, whereas the connections between them model the causal dependencies. Usually, some of the nodes have a direct relation with the random variables in the problem and are called'visibles'. The other nodes, known as'hiddens', are used to model more complex probability distributions. Learning in graphical models can be done as long as the likelihood that the visibles correspond to a pattern in the data set, can be computed. In general the time it takes, scales exponentially with the number of hidden neurons.

approximation, boltzmann machine, partition function, (16 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > Gelderland > Nijmegen (0.05)
North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.36)

Add feedback

A Tighter Bound for Graphical Models

Leisink, Martijn A. R., Kappen, Hilbert J.

Neural Information Processing SystemsDec-31-2001

Theneurons in these networks are the random variables, whereas the connections between them model the causal dependencies. Usually, some of the nodes have a direct relation with the random variables in the problem and are called'visibles'. The other nodes, known as'hiddens', are used to model more complex probability distributions. Learning in graphical models can be done as long as the likelihood that the visibles correspond to a pattern in the data set, can be computed. In general the time it takes, scales exponentially with the number of hidden neurons.

approximation, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe > Netherlands (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.35)

Add feedback

On Neural Networks with Minimal Weights

Bohossian, Vasken, Bruck, Jehoshua

Neural Information Processing SystemsDec-31-1996

Linear threshold elements are the basic building blocks of artificial neural networks. A linear threshold element computes a function that is a sign of a weighted sum of the input variables. The weights are arbitrary integers; actually, they can be very big integers-exponential in the number of the input variables. However, in practice, it is difficult to implement big weights. In the present literature a distinction is made between the two extreme cases: linear threshold functions with polynomial-size weights as opposed to those with exponential-size weights.

linear threshold function, vector, weight vector, (14 more...)

Neural Information Processing Systems

Country: